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1.  Introduction 

[UC]  introduced  the  idea  of  an  ultracomputer  and  reviewed  various  basic 
algorithms  for  such  an  ensemble  of  processors  containing  "shuffle"  interconnec- 
tions. Later,  Harrison  and  Kalos  introduced  the  idea  of  an  N- dimensional  ultra- 
computer  (N-D  UC),  which  is  well  suited  for  implementing  many  algorithms. 
The  main  result  of  this  note  is  that  an  N-D  UC  can  simulate  the  right/left  and 
shuffle/unshuffle  connections  of  its  1-dimensional  counterpart  in  N  and  2N  steps, 
respectively.  Thus,  for  each  N,  an  N-D  UC  is  asymptotically  as  fast  as  a  1-D  UC 
of  the  same  size. 

In  Section  11  we  describe  N-D  UCs,  in  Section  HI  we  prove  our  main  result 
along  v/ith  some  related  theorems,  and  in  Section  IV  we  consider  a  couple  of 
modified  N-D  UC  models  which  yield  slightly  better  results. 

2.  The  Model 

In  this  section  we  give  the  definitions  necessary  for  reading  this  note. 

The  size  of  a  parallel  processor  is  the  number  of  processing  elements  (PEs)  it 
contains.  We  denote  this  processor  size  by  P  and  number  the  processing  elements 
"tig*  •••»  "*^.i- 

Def.  Given  a  parallel  processor  of  size  P  =  2^,  let  b  .  b  ,  •••bQ  be  the  binary 
representation  of  B.  Then  in  a  7-D  ultracomputer  each  PEg  is  directly  connected 
to  (its  four  neighbors): 

Left  PE 

Shnfne  PE(^_^h^_3 . . .  hob^_^)  (i.e.  a  left  rotate); 

Uoshufne      PE(b^_jb,_j...bi)  (i-e.   a  right  rotate). 

Intuitively,  an  N-D  UC  is  an  N-dimensional  rectilinear  grid  of  PEs  where 
every  row  (in  each  dimension)  is  a  1-D  UC. 


N 
Def.   A  dj^xdj^jX  ...  xdj  N-D  t/C  is  a  set  of  J^dj  PEs,  each  a  different  member 

i=l 

of  the  set  of  tuples  dj^xdj^.jX  ...xdj  such  that  PE(ej,, . . .  ,ek eO  is  comiected 

to  PE(e„ Ck,  .  .  .  ,ei),l^k^N  ,  if f  Cj^  is  comiected  to  C]^  in  the  size  d^  1-D 

UC. 

For  example,  a  d2  x  dj  2-D  UC  is  a  dj  x  dj  rectangle  of  PEs  where  each 
row  and  each  column  is  independently  a  1-D  UC  of  size  6^  and  d^  respectively. 

Throughout  this  note  we  let  d,  1^  i^N,  be  the  size  of  the  i-th  dimension,  U. 
be  the  set  of  1-D  UCs  in  the  i-th  dimension,  and  Cj  be  the  number  cf 
PEfCN,  .  -  .  .cn-i,  .  .  .  ,ei)  within  Uj. 

A  one-to-one  correspondence  is  needed  between  each  PE  in  a  dj^  xdj^^x 
...xd.  N-D  UC  and  each  PE  in  a  1-D  UC  of  the  same  size.  The  following 
correspondence  is  called  the  1-D  counterpart  of  the  N-D  UC. 

We  number  the  PEs  from  0  to  P-1  using  the  standard  row-major  indexing  of 
an  N-dimensional  array,  i.e.  PE^CnjCn-i,  .  .  .  ,6^)  corresponds  to  the  PE  with 
address 

N  m-1 

m=l  i=l 

in  a  1-D  UC.  We  often  write  the  address  of  PE^c^,  .  .  ,  ,ei)  in  its  1-D  counter- 
part as  Cj^^.-.e^  and  refer  to  this  PE  as  PE^  or  PE^^^  ^_^  ■  ■  ■  ty  Since  each  indi- 
vidual dimension  is  a  1-D  UC  each  d.  is  a  power  of  two.  Thus,  the  lea^t  signifi- 
cant log  d^  bits  correspond  to  e^;  the  next  log  dj  bits  correspond  to  &2^...\  and  the 
most  significant  log  dj^  bits  correspond  to  e^^.  Therefore,  dealing  with  each  Uj 
merely  requires  manipulation  of  the  proper  log  d.  contiguous  bits.    All  of  the 

results  in  this  paper  rely  on  the  natural  correspondence.  Note  that  the  correspon- 
dence depends  on  the  ordering  of  the  N  dimensions  and,  in  fact,  there  are  N! 
equally  valid  1-D  counterparts  for  each  N-D  UC. 

A  potential  drawback  of  N-D  UCs  is  the  number  of  connections  to  each  PE 
(4N,  as  compared  to  4  for  a  1-D  UC).  However,  this  number  is  manageable  for 
2  and  3  dimensions,  and  moreover  there  are  compensating  advantages. 

N-D  UCs  have  embedded  N-D  mesh  connected  computers  and  are  often  use- 
ful for  those  problems  which  can  be  decomposed  into  a  multidimensional  grid, 
where  the  computation  of  a  grid  point  often  requires  efficient  access  to  data  at 
neighboring  points.  For  example,  in  fluid  dynamics  problems  each  processor  can 
store  the  data  for  a  small  region  of  space  and  the  N-D  UC  connections  provide 
rapid  access  to  neighboring  regions. 

In  various  appUcations,  subproblems  such  as  the  FFT  can  be  solved  quickly 
by  considering  each  row  of  jDrocessors  to  be  an  individual  ultracomputer.  This 
concept  can  be  generalized  by  allowing  an  N-D  UC  to  simulate  many  differently 
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shaped  ultracomputcrs  of  smaller  dimension.  Thus,  we  are  able  to  choose  sub- 
spaces  in  many  different  ways. 

This  partitioning  is  also  adaptable  to  multiprogramming.  Various  programs 
can  execute  simultaneously  (but  separately)  on  these  smaller  dimensional  UCs  as 
need  dictates.  Moreover,  if  a  few  PEs  are  disabled,  an  appropriate  partitioning 
would  allow  many  of  the  (sub  N)-D  UCs  to  function. 

Fmally,  the  physical  layout  of  a  2-D  or  3-D  UC  may  be  more  modular  than 
its  1-D  counterpart  since  each  dimension  can  be  wired  separately. 

3.   Results 

In  this  section  we  show  our  main  result,  that  any  N-D  UC  can  simulate  the 
1-D  UC  right  and  left  connections  in  N  steps  and  the  shuffle  and  unshuffle  con- 
nections in  2N  steps.  Given  oiu:  correspondence  between  PEs  in  a  1-D  UC  and 
N-D  UC  we  show  these  results  are  almost  optimal  and  also  that  1-D  UCs  need  Cl 
(log  P)  steps  to  simulate  an  N-D  UC.  Finally,  we  show  that  N-D  UCs  are  quite 
versatile,  and  allow  many  PEs  to  function  in  a  desirable  fashion,  even  when  some 
are  disabled. 

Since  the  unshuffle  and  left  connections  are  merely  the  reverse  of  the  shuffle 
and  right,  we  will  give  proofs  only  for  the  shuffle  and  right  connections. 

Def.  During  every  step  of  a  permutation  of  a  set  of  items  in  a  parallel  processor 
each  item  has  two  PE  numbers  of  special  interest:  the  PE  number  in  which  it 
currently  resides  called  its  current  number  and  the  PE  number  in  which  it  will 
finally  reside  called  its  destination  number.  We  refer  to  the  PE  number  in  which 
an  item  originates  as  its  initial  number. 

In  the  following  propositions  we  assume  that  an  N-D  UC  has  dimension  sizes 
dpdj,  ...  ,dj^  and  show  how  it  can  simulate  the  connections  of  a  1-D  UC  of  size 

i»l 

Theorem  An  N-D  UC  can  realize  the  right/left  connections  of  its  1-D  counter- 
parts in  N  steps. 

Proof  The  following  accomplishes  a  right  send:  (Sec  figure  1.)  Every  PE  sends 
right  on  Uj.    Next,  every  PE  whose  number  contains  zeros  in  its  low  order  log  dj 

bits  (i.e.,  6^  =  0)  sends  right  on  Uj.   Then,  every  PE  whose  number  contains  zeros 

in  its  low  order  log  dj  +  log  d^  bits  sends  right  on  U^.   Simulation  continues  in  this 

fashion  for  a  total  of  N  steps  and  is  easily  seen  to  accomplish  the  desired  right 
send. 
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Figure  1.   Steps  to  Simulate  a  Right  Send. 
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Initial  Number: 

Current  Number 
after  1st  right  send 

Current  Number 
after  2nd  right  send 
(and  destination  number): 

Note  that  if,  as  in  many  algorithms,  the  odd-even  connections  (i.e.  PE2J.J 
with  fE^)  are  not  needed,  then  all  right/left  connections  of  the  1-D  counterpart 
are  just  Uj  right/left  connections. 

Def.  Given  l'^  PEs,  let  PEj^^'^  be  the  PE.  where  the  binary  representation  cf  j 
matches  that  of  k  except  in  the  i-th  low  binary  representation  of  j  matches  that  of 
k  except  in  the  i-th  low  order  bit  (which  is  complemented).  Then  in  a  aibe  con- 
nected computer  each  PE  is  connected  to  PE^'\  0^i<  q.  Thus,  each  PE  has  q 
coimections. 

Proposition  A  cube  connected  computer  of  l'^  PEs  can  realize  the  shuffle  and 
unshuffle  connections  in  q  steps. 

Proof  Recall  that  a  shuffle  is  a  left  rotate  of  the  bits  of  the  PE  number.  Since 
there  are  2^  PEs,  the  initial  PE  number  of  an  item  can  differ  by  at  most  q  bits 
from  its  destination  PE  number.  Our  method  is  to  iteratively  correct  (if  neces- 
sary) each  of  the  q  bits. 

Method  Each  PE  executes  the  following: 

For  j  =  0  to  q-1 

V  items  i  in  the  PE 

IF  DEST.  (i)  ?t  CURR.  (i) 

THEN  send  i  to  PE^J^ 

where:    DEST  (i)    is  the  j-th  low  order  bit  (LOB)  of  the  destination  number  of 

item  i. 

and       CURR.(i)  is  the  j-th  low  order  bit  of  the  current  number  of  item  i. 

For  each  item  all  q  bits  are  correct  at  the  end  of  the  simulation,  so  the  algo- 
rithm obviously  carries  out  the  simulation  correctly  in  q  loop  iterations.  How- 
ever, we  must  show  that  at  each  iteration  of  the  loop,  each  PE  executes  no  more 
than  one  data  communication  step  (i.e.,  one  send  and  one  receive). 
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After  the  j-th  step,  l^j<q,  the  current  number  of  each  data  item  has  its  low 
order  j  bits  equal  to  the  low  order  j  bits  of  its  destination  number  and  its  high 
order  q-j  bits  equal  to  the  high  order  q-j  bits  of  its  initial  number.  Smce  the  des- 
tination number  is  a  left  rotation  of  the  initial  number,  the  j-th  bit  of  the  initial 
number  (i.e.  j+1  bit  of  the  destination  number)  will  be  missing  from  the  current 
number  (see  Figure  2).  Half  of  the  PEs  (those  with  HOB  =  LOB)  will  contam 
exactly  two  items  (b.=0,l),  and  the  remaming  half  of  the  PEs  will  contain  no 
items.  Moreover,  on  step  j+1,  each  PE  containing  two  items  will  send  the  one 
with  b.^b.^^. 
Figure  2.  Simulation  of  a  1-D  shuffle  on  a  CCC. 

Initial  number  bq_i  bq_2  •  •  •  bj+i  bj  •  •  •    bj  bo 

Current  number 

(after  step  j)  bq_i  bq_2   •  •  •  bj+i  bj_i    •  •  •  bo  bq_i 

Destination  number  bq_2  bq_3   •  •  •  bj  bj_i    •  •  •     bo  bq_i 

On  the  first  iteration,  half  of  the  PEs  retain  their  items  since  their  low  order 
bits  are  aheady  correct.  Each  of  these  PEs  must  also  receive  an  item  from 
exactly  one  of  the  remaining  PEs  in  order  to  satisfy  the  requirement  that  half  the 
PEs  contains  two  items.  Thus,  half  of  the  PEs  send  an  item  and  the  other  half 
receive  an  item. 

Then,  as  noted  above,  for  iterations  2  through  q-1,  the  PEs  with  b^  =  b^ 
send  and  therefore  receive  exactly  one  item.  On  the  last  iteration,  the  PEs  with 
two  items  send  one  item  to  a  unique  PE  with  no  items. 
Q.E.D. 

Theorem  An  N-D  UC  can  realize  the  shuffle  and  unshuffle  connections  of  its  1- 
D  counterpairt  in  2N  steps. 

Proof   Recall  that  the  number  of  ^E^c^^^_^ d)  i°  ^^  ^'^  counterpart  of  an 

N-D  UC  is  ej^.i...ej,  and  that  the  result  of  a  1-D  UC  shuffle  is  this  number 
rotated  left  one  bit.  We  can  realize  this  1-D  shuffle  as  foUows:  Fu^t  shuffle  in 
each  dimension  separately,  i.e.  first  all  U^s  shuffle  in  parallel,  then  all  UjS,  etc. 
(Note  the  order  of  the  UjS  is  irrelevant.)  At  this  point  the  item  in 
PE/^  ^  r  -.  will  have  been  moved  to  PECn.CN-i,  •  •  •  ,ei  where  each  e-  is  c. 

rotated  left  one  bit.  The  items  are  ahnost  m  their  correct  destmation  PEs:  ine 
low  order  bit  of  each  q  is  the  high  order  bit  of  Cj  instead  of  being  the  high  order 
bit  of  c.  J  (where  Cq=c^).  Since  the  high  order  bit  of  Cj.j  is  the  low  order  bit  of 
Ci'-i  we  need  only  perform  a  left  rotate  on  the  low  order  bits  of  the  N  CjS.  Con- 
sider each  group  of  2^^  PEs  whose  numbers  agree  in  all  places  except  these  N  bits. 
Each  forms  a  size  2^  cube  connected  computer:  the  right  and  left  connections  of 
each  U.  allow  for  the  bit  complements.  By  the  preccdmg  proposition,  the  desired 
permutation,  the  shuffle,  can  be  realized  in  N  steps. 

Pages 
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Thus,  the  entire  1-D  counterpart  shuffle  can  be  realized  in  2N  steps. 
Figure  3.   Simulation  of  a  1-D  Shuffle  on  a  3-D  UC 

Initial  Number: 

Current  Number 
after  Shuffle 
in  each  U.: 

Current  Number 
after  (LOB  of)  Uj 
is  Fuced: 

Current  Nimiber 
after  U2  is  Fixed 
(and  destination 
number): 

Corollary  For  each  N,  an  N-D  UC  can  simulate  its  1-D  coimterpart  with  no 
asymptotic  loss  of  efficiency. 

The  next  two  propositions  show  that,  given  the  way  in  which  N-D  UCs  are 
mapped  onto  1-D  UCs  the  previous  simulations  are  almost  optimal. 

Proposition  An  N-D  UC  cannot  simulate  a  right  or  left  communication  of  its  1-D 
counterpart  in  fewer  than  N  steps. 

Proof  Consider  sending  an  item  right  from  PEp.j  to  PEq  on  an  N-D  UC.  All  the 
bits  in  the  item's  initial  nimiber  are  one,  but  in  the  destination  number  they  are 
all  zero.  In  one  step,  the  N-D  UC  neighbor  communication  can  only  be  used  to 
change  the  bits  associated  with  one  of  its  dimensions   (i.e.  one  of  its  Us).   Since 

there  are  N  dimensions  the  simulation  must  take  at  least  N  steps. 
Q.E.D. 

Proposition  An  N-D  UC  cannot  simulate  a  shuffle/unshuffle  connection  of  its  1- 
D  counterpart  in  fewer  than  2N  steps  if  N  is  even,  or  in  fewer  than  2N-1  steps  if 
N  is  odd. 

Proof  Consider  a  PE  with  the  following  number:  The  HOB  in  U.  is  i  mod  2,  the 
LOB  in  U.  is  i-f-1  mod  2,  and  the  remaining  bits  of  Uj  keep  reversing  parity 
(except  for  one  bit  position  when  log  e.  is  odd).  The  I-D  coimterpart  shuffle  of 
an  item  with  such  an  initial  number  is,  as  usual,  a  left  rotate  of  the  bits.    As 
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noted  in  the  preceding  proof,  an  N-D  UC  can  only  affect  the  bits  in  a  single 
dimension  in  one  data  communication  step.  Thus,  in  general,  the  N-D  UC  must 
perform  N  independent  shuffles  to  affect  all  the  bit  reverses  necessary.  However, 
in  the  number  constructed  above,  one  bit  v/ill  be  wrong  m  each  dimension  even 
after  these  preliminary  operations,  except  for  the  bit  in  d^  when  N  is  odd.  Thus, 
when  N  is  even  N  additional  data  communication  steps  are  needed,  and  when  N 
is  odd  N-1  steps  are  needed.  Q.E.D. 

N-D  UCs  can  be  viewed  not  only  as  1-D  UCs,  but  also  as  intermediate 
dimensional  UCs  of  a  given  total  size.  This  use  is  realized  by  splitting  the  N 
dimensions  on  an  N-D  UC  into  subgroups,  and  using  each  group  to  simulate  its 
1-D  UC  counterpjart. 

1   ^"^ 
Proposition   For  each  N,  a  size  P  N-D  UC  can  simulate  -j— -  2    (-1)'  (M-i)'^  size 

P  M-D  UCs,  M  ^  N,  with  no  asymptotic  loss  of  efficiency. 


M!i=Q 


Proof  Partition  the  N  dimensions  into  M  nonempty  groups.  Let  each  group 
independently  simulate  a  1-D  UC.  Since  there  are  M  independent  dimensions  of 
1-D  UCs,  by  definition  the  entire  connection  scheme  is  an  M-D  UC. 

The  number  of  differently  configured  M-D  UCs  simulated  is  just  the  number 
of  ways  of  placing  the  N  original  dimensions  into  M  nonempty  groups  [Feller,  pp. 
60]: 

1   M-i 

^2(-i)'(M-i)^ 

^^'■'  i=Q 

This  versatility  result  also  allows  various  configurations  to  continue  to  func- 
tion even  if  some  PEs  are  disabled. 

N-M 
Theorem    Given  a  size  P  N-D  UC,  with  d^^d^^  ...  ^dj^,M^N,  and  B^  J][    ^ 

N-M  '=^ 

disabled  PEs,  then  H  ^i  "  ^  ^'^  ^^  ^^^^  ^  simulated  simultaneously,  with 
no  asymptotic  loss  of  efficiency,  utilizing  P  -  B       J|       d.  PEs. 

i=N-M+l 

N-M 

Proof    By  ignoring  the  connections  in  the  first  N-M  dimensions,    Y\    ^   M-D 

i=i 
UCs,  of  dimension  dj^  j^_j_^  x  ...    xdj^,  can  be  simulated  by  a  fully  functional  N- 

D  UC.    If  there  are  B  disabled  PEs  then  at  most  B  of  these  M-D  UCs  cannot  be 

N-M  N 

used,  leaving    p[    dj  -  B  functioning.    Each  M-D  UC  utilizes       [j        d.  PEs. 

i=l  i=N-m+l 

Q.E.D. 
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In  the  preceding  pages  we  have  just  seen  that  N-D  UCs  are  as  powerful  as 
1-D  UCs.  Next  we  go  on  to  demonstrate  that  the  converse  is  not  true,  at  least 
for  our  natural  correspondence. 

Theorem  For  every  given  N  >  1,  any  size  P  1-D  UC  requires  ft(log  P)  data  com- 
munication steps  to  simulate  some  N-D  UC  counterpart. 

Proof  Assume,  without  loss  of  generality,  that  P  is  an  even  power  of  two  (i.e.  P 

=  2^0.   Consider  the  VP  x  VP  2-D  UC  and  the  connection  between  PE,g  q.  and 

PE(.j  Qy    Under  our  natuiral  correspondence,  these  PEs  correspond  to  PEq  and 

PE^  p  in  the  1-D  UC.   Both  PE  numbers  (0  and  VP)  have  binary  representations 

that  are  completely  zero  except  for  a  one  in  the  VP  bit  position  of  PE,^  q/s 

number.  The  fastest  method  to  make  this  bit  position  a  one,  but  to  leave  it  sur- 
rounded by  zeros,  is  to  add  one  (right  send)  and  then  left  rotate  (shuffle)  VP 
times.  Thus,  the  result  holds  f or  N  =  2.  For  every  N  >  2,  this  connection  also 
exists  in  some  N-D  UC  (i.e.  whenever  d^x  d^x  ...  x  dj  =  VP,  where  1  ^  i  ^ 

N),  and  the  result  follows.  Q.E.D. 

4.   Extended  Multidimensional  Ultracomputer  Models 

In  this  section  we  extend  the  previous  N-D  UC  model,  by  adding  extra  con- 
nections, to  increase  efficiency  in  simulating  smaller  dimensioned  UCs.  We 
require  that  each  PE  has  at  most  4N  connections  so  all  connections  must  be  added 
without  compromising  this  requirement.  This  can  be  accomplished  by  exploiting 
the  redundant  shuffle  and  unshuffle  connections  in  those  PEs  at  the  end  of  a  sin- 
gle dimension  (e.g.,  PEq  and  PE(i|_i  within  Uj). 

Model  1  Connections  are  added  to  the  standard  N-D  UC  to  allow  the  1-D  coun- 
terpart right/left  communications  to  be  accomplished  in  one  step.  A  connection  is 
added  from  PE^  to  PE^^p  where  the  binary  representation  of  B  has  all  ones  in 
the  low  order  log  (dj'dj^'  ...  'd^)  bits  but  not  in  the  next  log  (dj+j)  low  order  bits. 
Thus,  an  extra  connection  is  added  only  to  some  of  the  PEs  which  are  at  the  end 
of  a  dimension  (these  have  two  spare  connections  in  the  original  model),  so  that 
the  bound  on  the  number  of  connections  per  PE  is  maintained.  When  the  odd- 
even  PE  pairs  interact  in  this  model  the  nimiber  of  communication  steps  is 
reduced  from  N  to  1.  Note  that  NI  different  1-D  counterparts  can  be  simulated 
by  rearranging  the  N  dimensions  but  one  such  configuration  must  be  fixed  in 
advance. 

Model  2  We  extend  the  previous  model,  again  by  adding  connections,  so  that  for 
M  s  N  any  of  the  M  dimensions  can  simulate  the  right/left  connections  of  its  1-D 
counterpart  in  one  step.   In  the  previous  model  we  were  only  concerned  with  the 
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standard  counterpart,  here  1-D  counterparts  to  any  of  the  possible  M-D  UCs  (M 
^  N)  must  meet  the  requirement. 

Recall  the  simulation  of  the  1-D  counterpart  right  connection  in  the  original 
model:  Let  \Ai2  ,...,  dj^^  be  the  M  dimensions  of  a  (sub  N)  -  D  UC.  First  each 
PE  sends  right  on  Uj^.  If  a  PE  has  -^^^  =  0  it  sends  right  again,  however  this  time 
on  Ujj.  Each  PE  continues  sending  right  in  this  fashion  until,  for  some  1  s  k  ^ 
M,  ej^  i=  0.  Obviously,  a  send  right  from  a  PE  with  q  =  d^  -1  transfers  data  to  a 
PE  with  ej.  =  0.  Thus,  in  order  to  handle  all  possible  (sub  N)  -D  UCs,  a  PE  at 
the  tail  end  of  dimension  k  (i.e.  c^  =  d^^-l)  is  connected  to  the  PE  which  is  at  the 
other  end  of  dimension  k  and  also  is  the  right  neighbor  in  some  other  dimension  j 
(i.e.  PE(  ej...dk-i)  ^^  connected  to  PE(  e,  o  where  j  ^  k  and  e.  i=^  d-l).  Of 
course,  this  is  generalized  for  PEs  that  are  on  the  tail  end  of  more  than  one 
dimension. 

As  the  number  of  dimension  grows,  the  number  of  connections  on  some  PEs 
grows  exponentially.  Only  for  N  s  3  do  all  PEs  have  no  more  than  4N  connec- 
tions but  these  are  the  multidimensional  UCs  of  most  practical  interest. 


5.   Conclusion 

The  N-D  UC  architecture  looks  promising.  We  have  shown  that  it  can  simu- 
late a  1-D  UC  efficiently.  We  also  believe  that  N-D  UCs  can  be  used  to  handle 
certain  problems,  such  as  compressible  flows,  more  efficiently  than  1-D  UCs. 

As  we  have  shown,  all  ultracomputcr  algorithms  are  efficiently  implement- 
able  on  an  N-D  UC.  But  using  the  connections  of  an  N-D  UC  to  simulate  the  1- 
D  UC  connections  is  only  one  way  of  using  it:  many  problems,  such  as  summing 
and  permutation,  can  be  handled  by  more  efficient  algorithms  tailored  to  the  N-D 
UC  connections. 

One  issue  not  addressed  in  the  preceding  is  asynchronous  communication. 
Up  to  this  point,  v/c  have  assimied  the  data  communication  of  the  1-D  counter- 
part to  consist  of  synchronous  shuffles,  unshuffles,  rights,  and  lefts.  When  this  is 
not  the  case,  i.e.  when  the  connections  in  a  simulated  1-D  ultracomputcr  are  used 
to  accomplish  a  mixture  of  shuffles,  unshiiffles,  rights,  and  lefts  in  a  single  cycle 
simulation  is  not  quite  so  smooth.  An  obvious  solution  is  to  lengthen  each  simu- 
lation step  so  that  first  the  shuffle  communications  are  executed,  then  the  unshuf- 
fles, then  the  rights  and  finally  the  lefts.  Since  this  can  be  done,  even  an  'inho- 
mogeneous'  communication  step  of  the  kind  we  have  just  been  considering  can  be 
simulated  in  at  most  6N  steps. 
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